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Abstract 

We show in this paper that third- and fourth-order low storage Runge-Kutta algo- 
rithms can be built specifically for quadratic nonlinear operators, at the expense 
of roughly doubling the time needed for evaluating the temporal derivatives. The 
resulting algorithms are especially well suited for computational fluid dynamics. 
Examples are given for the Henon-Heiles Hamiltonian system and, in one and two 
space dimensions, for the Burgers equation using both a pseudo-spectral code and 
a spectral element code, respectively. The scheme is also shown to be practical 
in three space solving the incompressible Euler equation using a fully parallelized 
pseudo-spectral code. 
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1 Introduction 



The explicit Runge-Kutta (ERK) method has a long and illustrious history in 
computational science and engineering. For example, these schemes are often 
selected in high spatial resolution studies of turbulence, in which the explicit 
nature of the scheme is used specifically in order to capture all time scales 
in the manner of direct numerical simulation (DNS). In the petascale era ex- 
tremely high spatial resolutions can be reached, and improved accuracy in the 
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time integration will also become necessary. The National Science Foundation 
has issued a baseline capability for its peta-scale initiative whereby a compu- 
tation of homogeneous turbulence on a uniform grid of 12288 3 points and at 
a Taylor Reynolds number R\ ~ 2000 should be accomplished in forty wall 
clock hours. In view of the large arrays to be stored for such a computation, 
and in view of the large number of operations required in order to arrive at 
a time of order unity required for this initiative, it is clear that an accurate, 
low-storage and reasonably simple scheme is imperative if this challenge is to 
be met. 



While there are many references to Runge-Kutta schemes in the scientific and 
engineering literature, we mention several that are of particular interest in this 
paper. It is well known that the standard fourth-order Runge-Kutta method 
can be evaluated using three levels of storage [I], and several low storage 
versions of Runge-Kutta methods are considered in [2]. In reference [3], new 
low storage methods adapted to acoustic problems is presented. Obviously, 
the stability of the integration schemes is an important issue. For high-order 
pseudo-spectral advection type problems this topic is also examined in de- 
tail in [1]; we consider here the problem of the stability of the algorithms 
empirically rather than from an analytical perspective. 



In this work we begin with an algorithm (see Eq. [5] in Sec. (12. 2p ) which is 
attributed to Jameson, Schmidt and Turkel [5] in the text of [6] . We do not find 
this algorithm in [5], and are thus uncertain why the latter text would make 
this attribution. Nevertheless, we refer in the following to the algorithm given 
by Eq. §5§ as the JST algorithm. Ostensibly, the JST algorithm requires only 
two levels of storage, but is of arbitrary order only for time-dependent linear 
problems. We show here that when the right hand side is nonlinear, corrections 
to the JST algorithm are needed if one wants to go beyond second order. The 
purpose of the present note is, after showing the limitations of the original 
algorithm, to compute the first corrections to the algorithm for quadratic 
nonlinearities as encountered for example in the modeling of incompressible 
flows. Our main result is that they can be implemented while preserving the 
low storage requirements. 



The paper is organized as follows. Section [2] contains the general formulation 
of our new time-stepping algorithm. Numerical applications to conservative 
and dissipative systems are provided in Section [31 Finally, in Section [4] we 
present our conclusion. 
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2 General formulation 



2.1 Basic definitions 

Our starting point will be the following equation of motion for the vector u 

^ = F(u), (1) 

with 

F(u) = L(u) + N(u,u). (2) 

In what follows, all that we will explicitly require of F is the linearity of L and 
the fact that N is quadratic and symmetric in its arguments. The following 
considerations will thus be valid for any L satisfying 

L(Aui + /iu 2 ) = AL(ui) + /iL(u 2 ) (3) 

and quadratic N satisfying 

N(ui,u a ) =N(u a ,ui). (4) 

The main applications we have in mind are high order spatial discretizations 
of the Navier-Stokes equation, the incompressible Euler equation (Eq. [TSj) . 
the magnetohydrodynamics (MHD) equations, or similarly nonlinear systems. 
Thus, the (real) vector u represents all of the (complex) components (modes) 
v a (k) in a pseudo-spectral treatment, or all nodal or modal values in a spectral 
element or other high-order discretization. In these cases L and N can be 
obtained readily from the relevant terms in the discrete systems. Naturally, 
the same schemes may be useful in low order or fixed truncation methods as 
well. 



2.2 JSTloop 



All of our new algorithms start by using the current value of the field u = u(i), 
to compute the order-s JST loop [6] 



u < U 

For k — s, 1, — 1 

F(u* 
u* <- u + At- 



k 

End For . (5) 



3 



The original order-s JST algorithm [6] simply amounts to setting, after the 
JST loop El 

u JST (t + At) = u*. (6) 

In the special case of linear F, where N = 0, this algorithm can be obtained 
directly by factorizing the Taylor expansion 

u(t + At) = u(t) + ]T(At) fc ^-ii + 0(At s+1 ) (7) 
k=i k - 

into 

u(t+At) = na+^)u(t) (8) 

and is therefore exact. However, it is easy to check by explicit computation 
that when N^O, errors are present, beginning at order 3 for s > 3. 

Indeed, defining the error term by 

8u = u JST (t + At) - u cxact (t + At) , (9) 

it is straightforward to obtain explicit Taylor expansions in time for both 
u exact (t + At) and u (t + At) respectively from the evolution equation (Eq. 
[[}) and the definitions (Eq. [5]) and (Eq. [6]), and compute the difference (Eq. 
IH]). For example, using obvious index notation for the rank-ra vectors u and F, 
we obtain for the i component when s > 3 

aj.3 n ap 

Note that this expression suggests that the local error is 0(At 3 ); in almost all 
problems of interest, we integrate to a finite time, so the global error will then 
be C(At 2 ). The basic idea of the new algorithms we propose below is to modify 
the JST loop in order to cancel the 0(At 3 ) term (and higher order terms) 
in (Eq. [TO]) that arises in the presence of nonlinear terms in the evolution 
equation. 



2.3 Correction terms 



In the following it will be convenient to vary the number of iterations of the 
JSP algorithm independently of the order of the desired algorithm. As a result, 
we will refer to calculations made with the original algorithm at arbitraty 
iteration count s as a JST-s scheme. Recall that for nonlinear problems, all 
these algorithms have second order global truncation errors. 

Using (Eq. [TDl we immediately arrive at a new 3 rd -order algorithm setting, 
after s = 3 JST iterations, 
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u 
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u 
u 



L(u) + N(u, u) 
2N(u,u) 

' At 3 



u 



21 



-u 



'11^ 



Empirically we find that by increasing the number of JST iterations at a given 
truncation order, the overall error of the result is reduced. This can be shown 
to be the result of partial cancellations in higher order terns in (Eq. [TUT) . For 
example, if we use s = 4 JST iterations (a JST-4 algorithm), and then apply 
the recipe (Eq. [Til , we reduce the errors still further, even though the scheme 
is still manifestly 3 rd order. We denote these cases with a +, where the number 
of + following the order indicate how many extra JST iterations were done. As 
a result, the 3 rd -order method with s = 4 is refered to as 3+ in the discussion 
below. 

Following the same procedure, we can compute 4 th -order correction terms. 
Higher order terms in (Eq. [TO]) can be canceled by making use of reasonable 
extra computational resources. Thus, a new 4 th -order algorithm requires that 
after the JST-4 algorithm we set 



u <- L(u) + N(u, u) 

u <- u + ^[L(u) + 2N(u*, u)] 

u «- 2N(u,u) 

* At 3 
u <— u H u 

24 

At 4 

uVu* + -[L(u)+2N(u*,u)] 

u <- u* . (121 



Again, if (Eq. [5]) is executed for 5 iterations before applying (Eq. [T2"j) . we 
generally see a reduction in the overall error; hence, this approach is called 
the 4+ scheme. 

Table [JJ contains a summary of some of the different possibilities. The rows 
correspond to the number s of JST loop iterations and the columns to the 
order of the correction terms (Eq. [TT|) or (Eq. [T2|) . The number of evaluations 
of nonlinear terms and the order of the resulting method are indicated. Extra 
+ symbols follow the previously defined convention and are also related to the 
amount of error present in the numerical examples (cf. Sec. 
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s 


No correction 


Third order 


JTULLi III (JICLci 




Order 




Order 






2 


2 nd 


2 


* 


* 


* * 


3 


2 nd + 


3 


3 rd 


5 


* * 


4 


2 nd + + 


4 


3 rd + 


6 


4 th 8 


Tab 


le 1 



Number of evaluations of nonlinear terms unl and order of the method obtained 
with s JST iterations (Eq. [5]) using, respectively, no correction, 3 rd order (Eq. [TTj) . 
and 4 th order (Eq. fT2j) correction terms. 



3 Numerical results 



We now test these new algorithms on both conservative and dissipative sys- 
tems. The conservative systems are the 2-degree-of-freedom classical mechan- 
ics Henon-Heiles system and the full 3-D spatially-periodic incompressible 
Euler equation. Two dissipative systems are described by the Burgers equa- 
tion. The first dissipative example uses a standard 1-D Fourier pseudo-spectral 
method, while the second a 2-D spectral element method. 



3.1 Conservative systems 



3.1.1 Henon-Heiles 

The Henon-Heiles Hamiltonian was introduced in 1964 [7] as a mathematical 
model of the chaotic motion of stars in a galaxy. It is one of the simplest 
Hamiltonian systems to display soft chaos in classical mechanics. The Henon- 
Heiles Hamiltonian reads 

E _ x 2 + y 2 x 2 + y 2 + 2x 2 y - ft/ 3 

The associated nonlinear nonintegrable canonical equations of motion that, of 
course, exactly conserve the total energy E are 



x = —x — 2xy , 

y = -y-x 2 + y 2 . (14) 

These equations are of the general quadratic form (Eq. [2]), so the Henon-Heiles 
system (Eq. [T4|) is thus perhaps one of the simplest non-trivial dynamical 
system in which to test our new algorithms. 

Figure [H displays the numerical time-evolution of the relative error in the 
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conserved energy E(t) (Eq. [T3|) . with At = .001 and initial data x(0) = 0, 
y(0) = 0.12, x(0) = 0.486239, y(0) = 0.018 corresponding to E = .125 for all 
cases. The error is computed as (E(t) — E )/E Q . 

It is apparent in the figure that the level of error decreases as the order of the 
method is increased. We note further that secular errors that are present in 
the third order method are canceled in the 3+ case. As expected, the fourth 
order conservation is much more precise than the 3+, in this case, at or below 
machine round-off. 
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Fig. 1. Relative error in energy versus time in the Henon-Heiles system. Since 
the system is conservative, the error should be zero. Equations (Eq. [T4l) are solved 
and the resulting errors (Eq. [T3|) are shown corresponding to 2 nd -order (3 JST 
iterations; solid), 3 rd -order (3 JST iterations with correction; dotted), 3+ (4 JST 
iterations with 3 rd -order correction; dash-dotted), and 4 th -order (4 JST iterations 
with 4 th -order corrections; dash). 



3.1.2 Three-dimensional incompressible Euler equations 

The (unit density) three-dimensional (3D) incompressible Euler equations, 



<9fV + (v • V)v = — Vp , 

V-v = 0, (15) 

obeyed by a spatially 27r-periodic velocity field can be approximated [6] by 
a (large) number of ordinary differential equations (ODEs) by performing 
a Galerkin truncation (v(k) = for |k| < A; max ) on the Fourier transform 
v(x,t) = Ev(k,t)e ikx . 

One thus needs to solve the finite system of ODEs for the complex variables 
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Fig. 2. (left) Energy conservation in terms of relative error versus time. The in- 
compressible Euler equation is solved using a dealiased pseudospectral method with 
resolution 64 3 . Results are shown for JST-2, -3, and -4 schemes (black, red dot- 
ted, and green dash-dotted curves, respectively), 3 rd -order (blue dashed), 3+ (cyan 
dotted), and 4 th -order (solid magenta), (right) Helicity conservation in terms of 
relative error versus time, with curves representing the same schemes as for energy 
conservation. 

v(k) (k is a 3D vector of relative integers (ki, k 2 , k 3 ) satisfying |k| < /c max ) 

d t v a (k,t) = --V a p y (k)Y,vp(p,t)v y (k - p,t) , (16) 
A p 

where V a ^ = kpP ai + /c 7 P Qj g with P a/3 = <5 Q/3 — k a kp/k 2 and the convolution 
in (Eq. [TO]) is truncated to |k| < /c max , |p| < k max and |k — p| < k max . This 
time-reversible system exactly conserves the kinetic energy E = J2k E(k, t) 
and helicity H = J2k H(k, t) where the energy and helicity spectra, E(k, t) and 
H(k, t), are denned by respectively averaging ||u(k', t)\ 2 and u(k', t)-Cj(— k', t) 
(with uj = V x u the vorticity) on spherical shells of width Ak = 1. 

Numerical solutions of equation (Eq. [16]) are efficiently computed using a 
pseudo-spectral general-periodic code [8119] with 64 3 Fourier modes that is 
dealiased using the standard 2/3 rule [TU] by spherical Galerkin truncation at 
kmax = 21. The code is fully parallelized with the message passing interface 
(MPI) library. 

The truncated Euler equation dynamics reaches, by way of progressive ther- 
malization [11], an absolute equilibrium that is a statistically stationary gaus- 
sian exact solution of the associated Liouville equation [T2] . Fig. [2] displays 
the energy and helicity conservation during this process. As in the previous 
section, the error in the conservation is measured as the relative difference in 
the energy (helicity) compared to that at t = 0, as in Fig. [TJ 

The first thing we notice is that the original JST-s scheme ([5]) has secular errors 
that are clearly visible for s = 2 and s = 3 but generally not as pronounced 
for s = 4 at late time (t > 6). We also observe that both JST-3 and the 



S 



10 



10 



10- 
At 



10 



-5 



-4.8 



-4.6 



At 



-4.4 



-4.2 



-4 



Fig. 3. (left) Absolute error of the slope of the Burgers front sup x (— d x v) at t = 2 ver- 
sus time-step At. Burgers equation with initial data sin(x) is solved using a dealiased 
pseudospectral method with resolution N = 64, v = 2ir/N. Results are shown for 
JST-4, (no corrections; circles), third order corrections (crosses), 3+ (squares), and 
fourth order corrections (diamonds). Straight lines show the slopes indicating con- 
vergence orders 2, 3 and 4. (right) Absolute error in the area under the Id curve 
(from [13]) vs time step for a spectral element N-wave solution to Burgers equation. 
Top red curve (slope 2.0017): JST-2 algorithm; black curve: JST-3 (slope 2.0181); 
green curve: 3 rd -order (slope 2.989). The blue curve (bottom) uses the 3+ algorithm 
(slope 2.9716). 

3 rd -order schemes behave monotonically up to a certain time, then begin to 
increase its global error. In fact, the global error of the 3 rd -order algorithm 
begins to exceed that of the JST-2 and JST-3 schemes at around t = 4. Before 
t ~ 2.5, however, we see convergence of the errors that behave with the new 
schemes roughly as they do in Fig. [TJ As expected, we see the lowest errors 
when the method is increased to 4 th -order; however, it is clear that part of the 
errors present in the pure 3 rd -order scheme are canceled in the 3+ case. Note 
that the 3+ and 4 th -order schemes yield errors that decrease with time for 
this problem, a feature which may become important for very long integration 
times. 

3.2 Dissipative systems 

3.2.1 ID Burgers equation with a pseudospectral calculation 
The ID Burgers equation 



is of the general form ([2J). It is solved here using a standard pseudo-spectral 
code with 64 Fourier modes, dealiased using the 2/3 rule [ID] . We run to 
t = 2 by which time a sharp front has formed for the chosen initial conditions 
u(x,t = 0) = sin x, a front whose width is related to the viscosity, v. Each 



d t u(x, t) + u(x, t)d x u(x, t) 



vd xx u(x,t), 



(17) 
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run is made with a different time step At, in order to check the truncation 
error as a function of At. We use as an error measure the absolute difference 
between the slope of the front in the numerical and analytical (see, e.g., |13j ) 
solutions. 

In Fig. [3] we present the Burgers front truncation errors. We see immediately 
that the errors decrease generally with an increase in the order of the scheme. 
In addition it is clear that here, as in Fig. [21 the 3+ scheme, while still 3 rd - 
order, can produce global errors that are significantly lower than the 3 rd -order 
scheme alone. 



3.2.2 ID Spectral elements 

The spectral element method [2] combines the flexibility of finite elements 
with the spectral convergence of the pseudo-spectral method. Functions are 
expanded in each element in terms of Lagrange interpolating polynomials (here 
the Gauss-Lobatto-Legendre polynomials), and C° continuity conditions are 
imposed on the element interfaces so that the functions reside in H 1 . The im- 
plementation we use is described in JT5], and draws heavily from works by other 
investigators [Tt3fl7lll8lll9f20j in order to develop a new dynamic /i-adaptive 
mesh refinement (AMR) formulation whereby the elements are subdivided 
isotropically according to a variety of a-posteriori refinement (and coarsen- 
ing) conditions. The implementation sets the same polynomial degree in all 
elements, although this is not required of the method, and the polynomial 
degree can be varied for each run. The code forms a framework for solving 
a variety of PDEs. In addition, it is scalable, and is parallelized also by way 
of MPI. For this work, we use the nonlinear advection-diffusion solver, which 
solves the multi-dimensional Burgers equation. The solver allows the use of 
semi-implicit and explicit time stepping methods; an existing 2 nd -order JST 
method was modified to include trivially the third order correction terms (Eq. 
[TTT) . with no additional storage. 



For this test, we solve the N-wave problem [T3J on a 2D mesh without adap- 
tivity. The 2D Cole-Hopf transformation 

u = -2vV\nx (18) 
transforms (Eq. [T7I) into a heat equation for x- Choosing a source solution [13] 

X(x, t) = 1 + - exp — — , 

we obtain the solution to (Eq. [T7j) immediately from (Eq. [T81 : 

(> OC X q CL / \ 

= 77 -77— r. 19) 
1 t a + texp((x-x ) 2 /4z/t) 
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This N-wave solution is essentially a ID solution, but solved using a 2D solver 
with and without the new high order time integration schemes. We choose 
v = 0.1, a = 0.01, and initial time t = 0.04, and integrate (Eq. IT71) to t — 0.06 
using a polynomials of degree 8 in each direction. In Fig. [3] we consider the 
absolute error in the area under the surface [13] vs time step in order to 
demonstrate the temporal convergence orders of the schemes. 



4 Conclusion 

We have shown that for quadratically nonlinear equations of motion the JST 
algorithm [5J, [6] needs corrections to go beyond second order. We have com- 
puted the correction terms that enable the JST algorithm to be used for 3 rd - 
and 4 th -order truncation errors in non-linear quadratic equations, and we show 
that by utilizing the original JST algorithm, the new algorithm up to 4 th -order 
can be implemented with low storage requirements. 

Numerical solutions to conservative and dissipative systems were used to verify 
the truncation errors of the new schemes, to verify their stability properties, 
and to demonstrate that they may be implemented easily using existing RK 
schemes. We considered implementations for a 3D pseudo-spectral incom- 
pressible Euler code, a ID pseudo-spectral Burgers code, and a 2D spectral 
element code. We find that the most cost effective method is the 3+ scheme, 
which requires 4 JST iterations and includes third order correction terms as 
described in the text. 

A natural question to ask is whether such high order explicit integration 
schemes are required. We hinted at an answer in discussing our 3D results: for 
problems which are highly resolved spatially, as in pseudo-spectral or other 
spectrally convergent discretizations, the time error can come to dominate the 
dynamics. This is clearly the case in Fig. [21 which shows that a low order 
time integration scheme integrated for a long time could produce spurious 
conservation properties, yielding an unphysical result. 
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